Vectorization and Parallelization of Loops in C/C++ Code
نویسندگان
چکیده
Modern computer processors can support parallel execution of a program by using their multicores. Computers can also support vector operations by using their extended SIMD instructions. To make a computer program run faster, the time-consuming loop computations in the program can often be parallelized and vectorized to utilize the capacity of multicores and extended SIMD instructions. In this paper, the vector multiplication and the matrix multiplication will be used as examples to illustrate how to perform parallelization and vectorization of loops in a C/C++ program when using Microsoft Visual C++ compiler or GNU gcc (g++) compiler. An overview of the Intel
منابع مشابه
Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems
Systems based on the Pentium III and Pentium 4 processors enable the exploitation of parallelism at a fineand medium-grained level. Dualand quad-processor systems, for example, enable the exploitation of mediumgrained parallelism by using multithreaded code that takes advantage of multiple control and arithmetic logic units. Streaming Single-Instruction-Multiple-Data (SIMD) extensions, on the o...
متن کاملVectorization and Parallelization of the Adaptive Mesh Refinement N -body Code
In this paper, we describe our vectorized and parallelized adaptive mesh refinement (AMR)N -body code with shared time steps, and report its performance on a Fujitsu VPP5000 vector-parallel supercomputer. Our AMR N -body code puts hierarchical meshes recursively where higher resolution is required and the time step of all particles are the same. The parts which are the most difficult to vectori...
متن کاملLecture Notes on Cache Iteration & Data Dependencies 15 - 411 : Compiler Design
Cache optimization can have a huge impact on program execution speed. It can accelerate by a factor 2 to 5 for numerical programs. Loops are the parts of the program that are generally executed most often. That is why cache optimization usually focuses exclusively on handling loops. Especially for loops that execute very often, optimizing small chunks of source code can have a fairly significan...
متن کاملA general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds
We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamically computed, data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable...
متن کاملMcFLAT: A Profile-Based Framework for MATLAB Loop Analysis and Transformations
Parallelization and optimization of the MATLAB programming language presents several challenges due to the dynamic nature of MATLAB. Since MATLAB does not have static type declarations, neither the shape and size of arrays, nor the loop bounds are known at compile-time. This means that many standard array dependence tests and associated transformations cannot be applied straight-forwardly. On t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017